As the COVID-19 pandemic has disrupted life for most of 2020, countries have tackled the containment and spread issues in a variety of ways. But what makes one country chose a specific plan compared to others, and why is there a wide array of attitudes towards a deadly disease affecting so many people? We set out to explore these questions using gender equality as a mechanism to choose countries to study. New Zealand, a country with under two thousand current coronavirus cases, ranks 5th in the world for gender equality. While the United States, surging with over ten million cases, ranks 51st in the world for gender equality. Because of the drastic difference in coronavirus case counts and gender equality ratings, we explored to what extent gender equality affects the attitudes and behaviors of men and women toward COVID-19. Specifically, we believe that the higher the gender equality rating for a country, the more likely men and women will share cautious viewpoints about COVID-19. While we used gender equality background information as a means to study the United States and New Zealand, our results from this research does not indicate gender equality underlies these differences, as there are several other confounding variables, such as cultural norms.
Our overall goal is to understand what makes an individual take more precautions than other individuals during a health pandemic. We decided to explore this question by creating a psychometric model which captured different attitudes and behaviors towards the COVID-19 pandemic. The variables used in this research stem from the “Gender Differences in COVID-19 Related Attitudes and Behavior Dataset” collected from a panel survey of eight OECD countries. The behavioral variables we studied were wearing gloves (weargloves), coughing and sneezing into an elbow (coughsneezeelblow), wearing a mask (wearmask), washing hands (washhands), not seeing friends (noseefriends), maintaining distance (distance), going out less (lessgoout), and avoiding crowded spaces (nocrowded). All of the behvaioral variables were scored on a scale of 0 to 1, with intervals of 0.1. A score of 1 indicates an individual participates in this behavior completely and is thus more cautious. Many variables in the dataset were presented as attitudes on an agree or disagree scale of 0 or 1. The attitude variables we studied were agree quarantine, agree curfew, agree health checks, agree systematic testing, agree wear mask, agree health measures, agree forbid trip, agree stop public transport, agree close boarders, agree close schools, agree close business, and agree close activates. A score of 1 indicated an individual agreed with this statement, and thus was compliment with the idea of taking precautions.
For our initial psychometric model, we created three latent variables to capture the different attitudes of the COVID-19 pandemic. Our first latent variable of hygiene was captured by the variables weargloves, coughsneezeelbow, washhands, and wear mask. Our second latent variable of social life was comprised of noseefriends, distance, lessgoout, and no crowded. Our third latent variable of government control was comprised of grouping averaging similar agree/disagree variable scores to establish 3 artificial variables. Taking the mean of similar attitude variables allowed us to create a scaled variable. The first artificial variable created was safety, which was taken from the mean of agree quarantine, agree curfew, agree health checks, agree systematic testing, agree wear mask, and agree health measures. The second artificial variable was travel, which was comprised of the mean of agree forbid trip, agree stop public transport, and agree close borders. Lastly, the third artificial variable was closing, which was comprised of agree school closed, agree business close, and agree close activities. These three artificial variables of safety, travel, and closing made up the category government control.
From our three latent variables of hygiene, social life, and government control we aim to capture the risk aversion variable. Risk aversion encompasses how easy or difficult it is for an individual to accept taking risk in regard to health matters, with 1.0 indicating difficulty and 0.0 indicating making risky decisions with ease. Risk aversion is scaled from 0 to 1, with an interval of 0.1
library(haven)
library(tidyverse)
library(dplyr)
library(readr)
library(tidyr)
library(stringr)
library(ggplot2)
library(psych)
library(yarrr)
library(lavaan)
library(lm.beta)
library(ggpubr)
library(corrr)
library(GPArotation)
library(plotly)
COVIDdataOrig<- read_dta("COVIDsurvey_CCpanel.dta")
COVIDdatafiltered <- COVIDdataOrig %>%
filter(country == "NZ" | country == "USA") %>%
select(weargloves, coughsneezeelbow, washhands, wearmask, agreequarantine, agreecurfew, agreehealthcheck, agreesystematictesting, agreewearingmask, healthmeasures, agreeforbidtrip,
agreepublictransportstop, agreecloseborders, agreeschoolclosed, agreebusinessclosed, agreecloseactivities, noseefriends, distance, lessgoout, nocrowded, riskaverse, country, id, female)
COVIDagree <- COVIDdatafiltered %>%
select(agreequarantine, agreecurfew, agreehealthcheck, agreesystematictesting, agreewearingmask, healthmeasures, agreeforbidtrip, agreepublictransportstop, agreecloseborders, agreeschoolclosed, agreebusinessclosed, agreecloseactivities, id) %>%
pivot_longer(agreequarantine:healthmeasures, names_to = "SafetyType", values_to = "Safety_Score") %>%
pivot_longer(agreeforbidtrip:agreecloseborders, names_to = "TravelType", values_to = "Travel_Score") %>%
pivot_longer(agreeschoolclosed:agreecloseactivities, names_to = "ClosingType", values_to = "Closing_Score")%>%
group_by(id)%>%
summarise(
safety = mean(Safety_Score, na.rm = TRUE),
travel = mean(Travel_Score, na.rm = TRUE),
closing = mean(Closing_Score, na.rm = TRUE)
)
COVIDpartial <- COVIDdatafiltered %>%
select(weargloves, coughsneezeelbow, washhands, wearmask, noseefriends, distance, lessgoout, nocrowded, riskaverse, country, id, female)
COVIDfull <- full_join(COVIDpartial, COVIDagree,
by = c("id"))
COVIDfull <- COVIDfull %>%
select(id, country, female, riskaverse, weargloves, coughsneezeelbow, washhands, wearmask, noseefriends, distance, lessgoout, nocrowded, safety, travel, closing)
COVIDhygiene <- COVIDfull %>%
pivot_longer(weargloves:wearmask, names_to = "PersonalHygiene", values_to = "HygieneScore")%>%
mutate(gender = ifelse(female == 1, "female", "male")) %>%
select(!female)
COVIDcontrol <- COVIDfull %>%
pivot_longer(safety:closing, names_to = "ControlType", values_to = "ControlScore")%>%
mutate(gender = ifelse(female == 1, "female", "male")) %>%
select(!female)
COVIDsocial <- COVIDfull %>%
pivot_longer(noseefriends:nocrowded, names_to = "SocialLife", values_to = "SocialLifeScore")%>%
mutate(gender = ifelse(female == 1, "female", "male")) %>%
select(!female)
The data was collected using Stata and was saved in a .dta format; we downloaded this file and then read in the data using the read_dta command. In the data wrangling stage of this project research, we created data frames which held the important variables from the original dataset. We created our artificial variables of safety, travel, and closing from averages of attitude variables. When creating these variables, there was missing data for many of the participants. We decided to calculate means based on the data that was available and not remove participants because they were missing one piece of data. If we removed any individual who was missing data, our number of observations would decrease dramatically which could pose a problem for the psychometric modeling fit.
Before completing the psychometric modeling tests of fit, we decided to explore the relationships of the variables in each latent factor. Our hygiene latent variable is comprised of weargloves, coughsneezeelbow, washhands, and wearmask. Higher values on this scale indicate an individual partakes in these hygiene practices frequently.
hygeine <- COVIDfull %>%
select(weargloves, coughsneezeelbow, washhands, wearmask)
pairs.panels(hygeine,
jiggle = TRUE,
ellipses = FALSE,
pch = ".",
factor = 2.5)
hygeine1 <- correlate(hygeine)
##
## Correlation method: 'pearson'
## Missing treated using: 'pairwise.complete.obs'
network_plot(hygeine1, color = c("firebrick2", "#b2b9d2", "deepskyblue"), min_cor = 0, curved = TRUE)
Both of the plots above show the correlation within the hygiene dimension of COVID-19 attitudes. From the graphs, it is clear that wearing masks and wearing gloves are the most highly correlated variables, followed by a high correlation between coughsneezeelebow and washhands.
We decided to create both a plot for an expert and a plot for the layperson because both contain valuable information. The plot for the expert shows the distribution of variables which is critical in making sure there are linear relationships, and it also shows the correlation values. Our pairs panel plot for the expert shows mostly linear relationships. However, the relationship between washhands and coughsneeze elbow seems to not be perfectly linear. The network plot for the layperson extracts the data for the pairs panel but presents it in a digestible format. The color coding indicates positive and negative relationships, and the brightness indicates the strength of the relationships, making it easy to draw conclusions without much prior knowledge. We continued to use both the pairs panel and network plot as we explored the other two latent variables.
We moved onto the control latent variable which is comprised of attitudes towards safety, travel and closings. Higher values on this scale correspond to more approval of strict government control to prevent the spread of COVID-19.
control <- COVIDfull%>%
select(safety, travel, closing)
pairs.panels(control,
jiggle = TRUE,
ellipses = FALSE,
pch = ".",
factor = 2.5)
control1 <- correlate(control)
##
## Correlation method: 'pearson'
## Missing treated using: 'pairwise.complete.obs'
network_plot(control1, color = c("firebrick2", "#b2b9d2", "deepskyblue"), min_cor = 0, curved = TRUE)
Both of the plots above show a strong correlation between all three variables within the government control dimension of COVID-19 attitudes.
Lastly, we examined the social life latent variable which is comprised of not seeing friends, maintaining distance, going out less, and avoiding crowded areas. Higher values on this scale correspond to increased precaution taking in relation to socializing.
sociallife <- COVIDfull %>%
select(noseefriends, distance, lessgoout, nocrowded)
pairs.panels(sociallife,
jiggle = TRUE,
ellipses = FALSE,
pch = ".",
factor = 2.5)
social1 <- correlate(sociallife)
##
## Correlation method: 'pearson'
## Missing treated using: 'pairwise.complete.obs'
network_plot(social1, color = c("firebrick2", "#b2b9d2", "deepskyblue"), min_cor = 0, curved = TRUE)
Both of the plots above show a similar strong correlation throughout the variables within the social life dimension of COVID-19 attitudes.
After understanding the relationship between variables, we decided to test our model fit using exploratory data analysis. While our initial model was created based on variables that seemed related to one another, the exploratory factor analysis allows us to determine which variables map onto which latent factors, and how many latent factors should be contained within our model.
COVIDvariables <- COVIDfull %>%
select(noseefriends, distance, lessgoout, nocrowded, safety, travel, closing, weargloves, coughsneezeelbow, washhands, wearmask)
scree(COVIDvariables)
Looking at the factor analysis trend line, the scree plot indicates a four factor model would best fit our data because the last significant slope change for the Eigen values occurs at the 4th factor or component number.
vss(COVIDvariables)
##
## Very Simple Structure
## Call: vss(x = COVIDvariables)
## VSS complexity 1 achieves a maximimum of 0.74 with 1 factors
## VSS complexity 2 achieves a maximimum of 0.88 with 3 factors
##
## The Velicer MAP achieves a minimum of 0.04 with 3 factors
## BIC achieves a minimum of NA with 5 factors
## Sample Size adjusted BIC achieves a minimum of NA with 6 factors
##
## Statistics by number of factors
## vss1 vss2 map dof chisq prob sqresid fit RMSEA BIC SABIC complex
## 1 0.74 0.00 0.048 44 8.8e+03 0.0e+00 6.5 0.74 0.180 8391 8530.8 1.0
## 2 0.66 0.83 0.057 34 3.1e+03 0.0e+00 4.3 0.83 0.122 2798 2906.3 1.2
## 3 0.70 0.88 0.045 25 3.4e+02 8.9e-57 2.5 0.90 0.045 121 200.4 1.2
## 4 0.69 0.86 0.066 17 1.0e+02 1.8e-14 2.3 0.91 0.029 -44 9.6 1.4
## 5 0.51 0.82 0.100 10 4.1e+01 9.4e-06 2.1 0.92 0.023 -46 -13.9 1.6
## 6 0.63 0.84 0.150 4 2.9e+00 5.8e-01 2.0 0.92 0.000 -32 -19.3 1.6
## 7 0.59 0.84 0.224 -1 3.8e-01 NA 1.8 0.93 NA NA NA 1.6
## 8 0.64 0.84 0.326 -5 4.2e-05 NA 1.8 0.93 NA NA NA 1.6
## eChisq SRMR eCRMS eBIC
## 1 9.6e+03 1.2e-01 0.1339 9232
## 2 3.5e+03 7.3e-02 0.0924 3243
## 3 2.0e+02 1.7e-02 0.0256 -18
## 4 3.9e+01 7.7e-03 0.0138 -109
## 5 1.4e+01 4.5e-03 0.0106 -74
## 6 1.1e+00 1.3e-03 0.0047 -34
## 7 9.4e-02 3.8e-04 NA NA
## 8 1.3e-05 4.4e-06 NA NA
The vss plot is interesting because it indicates that the fit worsens at two factors, then improves at three, and then worsens again at four. The vss plot and map indicator advises there should be three factors.
Because of the competing factor numbers produced from the scree plot and vss output, we used oblique factor analysis to determine which variables should map onto each factor for both the 3 and 4 factor model. This allows us to create each model, and eventually serves as the basis to compare the fit between the models.
COVID3_oblique <- fa(COVIDvariables, nfactors = 3, fm = "minres", rotate = "oblimin")
COVID3_oblique
COVID4_oblique <- fa(COVIDvariables, nfactors = 4, fm = "minres", rotate = "oblimin")
COVID4_oblique
The 3 factor model mapped no see friends, distance, less go out, no crowded, cough sneeze elbow, and wash hands onto one factor, then mapped safety, travel, and closing onto another factor, and lastly wear gloves and wear mask onto the last factor.
The 4 factor model mapped no see friends, distance, less go out, and no crowded onto one factor, then mapped safety, travel and closing onto another factor, then mapped wear gloves and wear mask onto another factor, and lastly mapped cough sneeze elbow and wash hands onto the last factor.
Now that we have our two competing models, we conducted confirmatory factor analysis to determine which of these two models best fit our data.
COVID3.model <-
'
protection =~ weargloves + wearmask
social =~ noseefriends + distance + lessgoout + nocrowded + coughsneezeelbow + washhands
control =~ safety + travel + closing
'
For the three factor model, we chose to name the factors protection (willingness to wear masks/gloves), social(willingness to go out and see friends), and control(willingness to comply with government mandates). The variables mapped onto each of these factors was extracted from the exploratory factor analysis.
COVID3fit <- cfa(COVID3.model, data=COVIDfull)
summary(COVID3fit, fit.measures = TRUE)
lavInspect(COVID3fit, what = "std")
The test statistic for our three factor model is 511.847 with 41 degrees of freedom, compared to the baseline model test statistic of 12010.358. The CFI statistic is .961. The RMSEA value is 0.062 with a confidence interval of [0.057, 0.067]. Lastly, the SRMR value is 0.036. The statistics from our three factor model indicate that model fits our data very well.
We then compared the model fit of the three factor model to the model fit of our four factor model.
COVID4.model <-
'
protection =~ weargloves + wearmask
social =~ noseefriends + distance + lessgoout + nocrowded
control =~ safety + travel + closing
hygiene =~ coughsneezeelbow + washhands
'
For the four factor model, we chose to name the factors protection (willingness to wear masks/gloves), social(willingness to go out and see friends), control(willingness to comply with government mandates), and hygiene (willingness to engage in sanitary activity).
COVID4fit <- cfa(COVID4.model, data=COVIDfull)
summary(COVID4fit, fit.measures = TRUE)
lavInspect(COVID4fit, what = "std")
The test statistic for our model is 415.743 with 38 degrees of freedom, compared to the baseline model test statistic of 12010.358. The CFI statistic is .968. The RMSEA value is 0.058 with a confidence interval of [0.053, 0.063]. Lastly, the SRMR value is 0.030. The statistics from our four factor model indicate that model also fits our data very well.
We are going to tentatively choose our four factor model to proceed with because the statistics for this model are closer to the ideal criteria, however, the four factor model is not statistically significantly better.
Before we commit to the four factor model for our data, we are going to test the reliability and validity of this model.
To test reliability, we are goig to look at the correlation between each variable within the factor, as determined by the exploratory factor analysis. To begin, we are going to examine the reliability of the Protection Latent Factor, which is comprised on wear mask and wear gloves.
ProtectionData <- COVIDfull %>%
select(weargloves, wearmask)
pairs.panels(ProtectionData, jiggle = TRUE, ellipses = FALSE, smoother = TRUE)
#protection2 <- correlate(ProtectionData, diagonal = 1)
#network_plot(protection2, color = c("firebrick2", "#b2b9d2", "deepskyblue"), min_cor = 0, curved = TRUE)
alpha(ProtectionData)
From the pairs panel visualization, there is only one correlation between the two variables in the factor and it is moderately strong. We were unable to use a network plot for the Protection factor because it only has two variables. If we could produce a network plot, it would show a connection with a decently filled in shade of blue to represent the .57 correlation between the two variables. The Cronbach’s alpha is .72, which is in the good category. This means our two variables captured different dimensions of the protection factor. In terms of reliability, dropping an item would no longer make this a factor because there would only be one variable left.
Because the protection factor had good reliability and validity, we moved onto the social factor.
ControlData <- COVIDfull %>%
select(safety, travel, closing)
pairs.panels(ControlData, jiggle = TRUE, ellipses = FALSE, smoother = TRUE)
control2 <- correlate(ControlData)
##
## Correlation method: 'pearson'
## Missing treated using: 'pairwise.complete.obs'
network_plot(control2, color = c("firebrick2", "#b2b9d2", "deepskyblue"), min_cor = 0, curved = TRUE)
alpha(ControlData)
The pairs panel visualization indicated the three variables are highly correlated but are still differentiated in terms of capturing different dimensions of government control. The Cronbach’s alpha is .87, which is in the very good category. Dropping any of the variables would negatively affect reliability.
Because the control factor had good reliability and validity, we moved onto the hygiene factor.
HygieneData <- COVIDfull %>%
select(coughsneezeelbow, washhands)
pairs.panels(HygieneData, jiggle = TRUE, ellipses = FALSE, smoother = TRUE)
#hygiene2 <- correlate(HygieneData)
#network_plot(hygiene2, color = c("firebrick2", "#b2b9d2", "deepskyblue"), min_cor = 0, curved = TRUE)
alpha(HygieneData)
The pairs panel shows the relationship between the two variables in the hygiene factor, which indicates a moderate correlation. We were unable to use a network plot for the Hygiene factor because it only has two variables. If we could produce a network plot, it would show a connection with a moderately filled in shade of blue to represent the .42 correlation between the two variables. The Cronbach’s alpha is .59, which is unacceptably low. Because the first three factors had good reliability, we will test to see if the three factor model fits the variables in a more reliable manner since there was no significant difference in model fit.
The reliability statistics will stay the same on the control and protection factors because the 3 factor model only combined the hygiene and social factors from the four factor model.
After solidifying our 3 factor model for the COVID-19 and Gender Attitudes Data, we tested our research question. Using measurement variance, we compared men and women within the United States, men and women within New Zealand, men in the United States and New Zealand, and women in the United States and New Zealand. By determining the the model fit to the comparison groups, we are able to begin to determine how gender and country effects the attitudes towards COVID-19.
We began by comparing men and women within the United States.
COVIDfull <- COVIDfull %>%
transform(female = as.character(female))
COVIDUS <- COVIDfull %>%
filter(country == "USA")
COVID3.model <-
'
protection =~ weargloves + wearmask
social =~ noseefriends + distance + lessgoout + nocrowded + coughsneezeelbow + washhands
control =~ safety + travel + closing
'
fit_config1 <- cfa(COVID3.model,
data = COVIDUS,
group = "female")
summary(fit_config1, fit.measures = TRUE)
By testing the configural model, we ensure that the model and structure fits the same across the constructs. This serves as a structure test. The fit indices for the configural model were: CFI- .976; RMSEA- 0.053 [0.047, 0.059]; SRMR- 0.027. Because the fit indices align with the criteria, we know that males and females have the same structure across our model in the United States.
We then advanced to the next level of measurement invariance of checking the weak model.
fit_weak1 <- cfa(COVID3.model,
data = COVIDUS,
group = "female",
group.equal = c("loadings")
)
summary(fit_weak1, fit.measures=TRUE)
The weak model checks if the un-standardized factor loadings are equivalent, meaning each item has the same meaning for each factor across groups. Our fit indices indicated that the weak model fit out data with fit indicies of CFI- .977; RMSEA- 0.050 [0.044, 0.056]; SRMR- 0.027. Because the weak model fit the data, the items in our model load the same both men and women in the United States.
We then advanced to the next level of measurement invariance of checking the strong model.
fit_strong1 <- cfa(COVID3.model,
data = COVIDUS,
group = "female",
group.equal = c("loadings", "intercepts")
)
summary(fit_strong1, fit.measures=TRUE)
The strong model invariance checks if the factor loadings and intercepts are equivalent. This ensures that both men and women in the United States respond the same to variables if they receive the same latent score. Our fit indices indicate that we can compare latent means across groups. CFI- .975; RMSEA- 0.049 [0.044, 0.055];SRMR: 0.029.
We then advanced to the last level of measurement invariance.
fit_strict1 <- cfa(COVID3.model,
data = COVIDUS,
group = "female",
group.equal = c("loadings", "intercepts", "residuals")
)
summary(fit_strict1, fit.measures=TRUE)
Unfortunately, there was an error with our observed variables, so the data for strict invariance cannot be interpreted. We cannot conclude if there is equal or unequal variability on each observed variable. Because strict measurement invariance cannot be interpreted, our conclusion within the United States is that individual item means within each latent variable can be compared across the United States.
Following a similar pattern for comparing men and women in the United States, we compared men and women in New Zealand.
COVIDNZ <- COVIDfull %>%
filter(country == "NZ")
fit_config2 <- cfa(COVID3.model,
data = COVIDNZ,
group = "female")
summary(fit_config2, fit.measures = TRUE)
The fit indices for the configural model are: CFI- .950; RMSEA- 0.053 [0.043, 0.062]; SRMR- 0.042. Because the fit indices align with the criteria, we know that males and females have the same structure across our model in New Zealand.
We the advanced and tested the weak model.
fit_weak2 <- cfa(COVID3.model,
data = COVIDNZ,
group = "female",
group.equal = c("loadings")
)
summary(fit_weak2, fit.measures=TRUE)
The fit indices for the weak model are: CFI- .951; RMSEA-0.050 [0.040, 0.059]; SRMR- 0.043.Because the weak model fit the data, the items in our model load the same both men and women in the New Zealand.
We then advanced to the strong model.
fit_strong2 <- cfa(COVID3.model,
data = COVIDNZ,
group = "female",
group.equal = c("loadings", "intercepts")
)
summary(fit_strong2, fit.measures=TRUE)
The fit indicies for the strong model are: CFI- .937; RMSEA- 0.054 [0.045, 0.063]; SRMR- 0.048. The measurement invairance for the strong level was met and indicates predicted observed means of each variable is equal across men and women in New Zealand.
We then tested strict measurement invariance.
fit_strict2 <- cfa(COVID3.model,
data = COVIDNZ,
group = "female",
group.equal = c("loadings", "intercepts", "residuals")
)
summary(fit_strict2, fit.measures=TRUE)
The fit indicies for the strict model are: CFI- .899; RMSEA: 0.065 [0.044, 0.055]. There is reason to believe that the strict invariance for this model is not met because the RMSEA value increase 0.011, which is the length of the confidence interval. Airing on the conservative side, we conclude the means for each group are the same, but there is different variability on each observed variable.
Therefore, while the similarities in New Zealand seem to be routed in the latent variable means, there are gender difference across the variability of the data that comprises the mean.
After understanding the differences between women and men within the United States and New Zealand we explored difference between gender groups across the countries to understand how each gender handles attitudes toward the COVID-19 pandemic as a result of the cultural environment. We began by comparing women in the United States and New Zealand.
COVIDfemale <- COVIDfull %>%
filter(female == "1")
fit_config3 <- cfa(COVID3.model,
data = COVIDfemale,
group = "country")
summary(fit_config3, fit.measures = TRUE)
The fit indicies for the configural model are: CFI- .963; RMSEA- 0.058 [0.051, 0.065]; SRMR- 0.033. Because the fit indicies align with the criteria, we know that females in both the United States and New Zealand have the same structure across our model.
We the advanced and tested the weak model.
fit_weak3 <- cfa(COVID3.model,
data = COVIDfemale,
group = "country",
group.equal = c("loadings")
)
summary(fit_weak3, fit.measures=TRUE)
The fit indices for the weak model are: CFI- .959; RMSEA-0.058 [0.051, 0.065]; SRMR- 0.038. Because the weak model fit the data, the items in our model load the same both women in the United States and New Zealand.
We then advanced to test the strong model.
fit_strong3 <- cfa(COVID3.model,
data = COVIDfemale,
group = "country",
group.equal = c("loadings", "intercepts")
)
summary(fit_strong3, fit.measures=TRUE)
The fit indices for the strong model are: CFI- .920; RMSEA- 0.078 [0.071, 0.084]; SRMR- 0.053. The measurement invarience for the strong level was not met because of the increase in the RMSEA value and indicates the individual item means within each latent variable are unequal across women in the two countries.
After finding difference between women in the United States and New Zealand, we explored measurement invariance to see model difference in men between the United States and New Zealand.
COVIDmale <- COVIDfull %>%
filter(female == 0)
fit_config4 <- cfa(COVID3.model,
data = COVIDmale,
group = "country")
summary(fit_config4, fit.measures = TRUE)
The fit indices for the configural model are: CFI- .979; RMSEA- 0.047 [0.039, 0.055]; SRMR- 0.030. Because the fit indices align with the criteria, we know that males in both the United States and New Zealand have the same structure across our model.
We the advanced and tested the weak model.
fit_weak4 <- cfa(COVID3.model,
data = COVIDmale,
group = "country",
group.equal = c("loadings")
)
summary(fit_weak4, fit.measures=TRUE)
The fit indicies for the weak model are: CFI- .973; RMSEA-0.051 [0.044, 0.058]; SRMR- 0.039. Because the weak model fit the data, the items in our model load the same men in the United States and New Zealand.
We then advanced to test the strong model.
fit_strong4 <- cfa(COVID3.model,
data = COVIDmale,
group = "country",
group.equal = c("loadings", "intercepts")
)
summary(fit_strong4, fit.measures=TRUE)
The fit indices for the strong model are: CFI- .951; RMSEA- 0.065 [0.044, 0.055]; SRMR- 0.049. The measurement invariance for the strong level was not met because of the increase in the RMSEA value and indicates the individual item means within each latent variable are unequal across men in the two countries.
It seems that while the model fits for the data as a whole, there were major differences between male attitudes in New Zealand and United States, and between female attitudes in New Zealand and the United States.
latentMeans <- COVIDfull %>%
rowwise() %>%
mutate(protection = mean(c(weargloves, wearmask), na.rm = TRUE)) %>%
mutate(social = mean(c(noseefriends, distance, lessgoout, nocrowded, coughsneezeelbow, washhands), na.rm = TRUE)) %>%
mutate(control = mean(c(safety, travel, closing), na.rn = TRUE)) %>%
select(id, country, female, riskaverse, protection, social, control) %>%
mutate(RiskAverse = (10 - riskaverse)/10)
percentage <- latentMeans %>%
group_by(country, female) %>%
drop_na() %>%
summarize(
Control = sum(control)/ length(control),
Social = sum(social)/ length(social),
Protection = sum(protection)/ length(protection)
)
## `summarise()` regrouping output by 'country' (override with `.groups` argument)
LongMeans <- percentage %>%
pivot_longer(Control:Protection, names_to = "COVIDCategory", values_to = "COVIDScore") %>%
mutate(gender = ifelse(female == 1, "Female", "Male"))
ggplot(data = LongMeans, aes(x = COVIDCategory, y = COVIDScore, color = COVIDCategory)) +
geom_bar(stat = "identity", fill = "white") +
facet_wrap(~country + gender) +
labs(title = "COVID Reponse Across the Globe", x = "Covid Response", y = "Percent Agree") +
theme_pubr()
The plot above depicts the differences shown by the measurement invariance calculations. Within New Zealand, both females and males seem to agree to a similar extent across the three factors. Within the United States, a similar pattern is seem between female and male attitudes as well. However, when comparing females across the two counties, females in New Zealand tend to agree more with government control measures than females in the United States. Additionally, females in the United States tend to agree more with measures to wear protection compared to females in New Zealand. The same trend is seen in males across New Zealand and the United States: males in New Zealand are more likely to agree with government control measures, and males in the United States are more likely to wear protection to prevent COVID-19.
One possible explanation for this is the New Zealand government took actions quick and shut down many boarder, thus increasing the agreement score. Because of these actions, there are very few cases in the country as a whole, meaning the citizens do not need to practice wearing protection to keep themselves healthy, thus indicating the lower protection agreement score. On the other hand, COVID-19 has become very politicized within the United States making it harder for citizens to agree to government control measures. Because the cases are rising, Americans must continue to wear protection to avoid contracting the virus, thus making the protection score higher.
Our plot is very easy for the layperson to understand because it shows percent agree measures between each country and gender clearly laid out. We decided on a lay person plot to explain our results because it captures the complex data from the measurement invariance calculations well. The color coding allows for easy comparisons and the general conclusions to be drawn without much inspection.
Before of these differences in the Control and Protection factors between the United States and New Zealand, we explored the distribution of data points within each factor.
Protection <- latentMeans %>%
select(id, country, protection)
pirateplot(data = Protection, protection ~ country, theme = 3, main = "Protection Measures")
For the protection factor RDI plot, there are clear differences in the mean and distribution of data points between New Zealand and the United States. The biggest bulk of New Zealand data points is at the 0 mark, indicating that many of them do no practice wearing protection, such as gloves or masks. However, in the United States the biggest bulk of data points resides at the .55 mark, indicating that individuals wear protection against COVID-19 more times than not.
We decided to use a plot geared towards an expert for this graph because a lay person may not want to dive into the details of the distributional differences. These differences are also a complex topic and hard to demonstrate using simple graphs. The RDI plot shows the data variability in a visual manner which is ideal for an expert looking for further information.
Control <- latentMeans %>%
select(id, country, control)
pirateplot(data = Control, control ~ country, theme = 3, main = "Control Measures")
Within the control factor, people living in New Zealand were much were much more receptive to government control than individuals in the United States. The biggest bulk of data points in New Zealand are around 0.8, indicating individuals agreed with most of the government control measures put in place. However, in the United States, the largest bulk of data points is at 0, meaning individuals did not agree with the government control policies.
Overall, the attitudes towards specific aspects of the COVID-19 pandemic are evident across countries and may be a reason there have been drastic differences in the number of cases in each country.
Part of our research question was to determine differences in COVID-19 attitudes across males and females from countries with different gender equality ratings. After finding group comparison differences, we were able to draw conclusions about these differences. However, now we are interested in exploring if taking more precautions, such as wearing protection and decreasing socializing, and agreeing to government control predicts how risky an individual is willing to be in the midst of a health pandemic. In order to explore this question we ran a multiple regression.
onlylatent <- latentMeans %>%
select(protection, social, control, RiskAverse)
We explored the correlation between our 3 factors from the model and their relationship to risk aversion.
pairs.panels(onlylatent,
jiggle = TRUE,
ellipses = FALSE,
pch = ".",
factor = 2.5)
correls <- correlate(onlylatent)
##
## Correlation method: 'pearson'
## Missing treated using: 'pairwise.complete.obs'
network_plot(correls, color = c("firebrick2", "#b2b9d2", "deepskyblue"), min_cor = 0, curved = TRUE)
The strongest relationship from the pairs panel and network plot is seen between the latent variable social and control. Unfortunately, no factor strongly correlates with risk aversion, with the strongest correlation being -.14 with government control. However, there does seem to be a non-linear relationship between the social factor and the contorl factor and the risk aversion.
regressionmodel <- lm.beta(lm(RiskAverse ~ protection + social + control, data = latentMeans))
summary(regressionmodel)
protection <- lm.beta(lm(RiskAverse ~ protection, data = latentMeans))
summary(protection)
social <- lm.beta(lm(RiskAverse ~ social, data = latentMeans))
summary(social)
control <- lm.beta(lm(RiskAverse ~ control, data = latentMeans))
summary(control)
Multiple Regression model: While controlling for social and control, for every 1 unit increase in protection, the risk aversion score increased by .9611. While controlling for protection and control, for every 1 unit increase in social, the risk aversion score decreased by -1.6737. While controlling for protection and social, for every 1 unit increase in control, the risk aversion score decreased by .8805. The r^2 from the regression model was .039, unfortunately meaning our regression model only explained 3% of the variance in risk aversion. The individual r^2 value for the protection variable is 0.0072, the r^2 for social is 0.01136, and the r^2 from control is 0.02064. Because the control variable r^2 values decreased the least from the multiple regression r^2 value, we can conclude that the control aspect is carrying most of the weight in predicting risk aversion, as compared to social or protection.
Because only 3% of the variance in risk aversion is explained by our multiple regression model, we thought the non-linear relationship seen by the social factor may have contributed to this. We decided to run another multiple regression, including only individuals who scored above 0.5 on the social factor.
socialabove5 <- latentMeans %>%
filter(social >= 0.5) %>%
select(protection, social, control, RiskAverse)
pairs.panels(socialabove5,
jiggle = TRUE,
ellipses = FALSE,
pch = ".",
factor = 2.5)
The pairs panel for including only those with a social score above 0.5 seems to represent a more linear relationship than the previous figure. We then ran another multiple regression in the same format as the previous model, but with this new set of data.
socialmodel <- lm.beta(lm(RiskAverse ~ protection + social + control, data = socialabove5))
summary(socialmodel)
Unfortunately, although the relationship seemed more linear, only 4.5% of the variance in risk aversion was explained by this model.
plotRiskAversion <-plot_ly(data = latentMeans,
x = ~protection,
y = ~social,
z = ~control,
color = ~RiskAverse, colors = c('blue','purple', 'red'), marker= list(size = 2)) %>%
add_markers() %>%
colorbar(title = "Risk Aversion Score ") %>%
layout(title = "Attitude Toward Risk Aversion",
scene = list(xaxis = list(title = 'Protection Score'),
yaxis = list(title = 'Social Score'),
zaxis = list(title = 'Control Score')))
## Warning: Ignoring 3089 observations
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
plotRiskAversion
The 3-D toggle graph above helps visualize the results of the multiple regression. The 3 axis are the factors from our model, with the colored dots representing risk aversion. From the graph, there is no clear pattern between when an individual scores high on risk aversion as compared to low. Therefore, because there is not this pattern, our regression model could not explain a lot of the variance.
While our regression model did not have strong predictive power of the Risk Aversion variable, we are still interested to see how risk taking compares between the genders and across the countries.
means <- latentMeans %>%
mutate(gender = ifelse(female == 1, "Female", "Male")) %>%
group_by(country, gender) %>%
summarize(
meanRisk = mean(RiskAverse, na.rm = TRUE),
meanControl = mean(control, na.rm = TRUE),
meanSocial = mean(social, na.rm = TRUE),
meanProtection = mean(protection, na.rm = TRUE)
)
## `summarise()` regrouping output by 'country' (override with `.groups` argument)
ggplot(data = means, aes(x = reorder(country, meanRisk),
y = meanRisk, fill = meanRisk)) +
geom_bar(stat = "identity") +
facet_wrap(~gender) +
labs(title = "Risk Aversion by Country and Gender", x = "Country", y = "Risk Averse") +
theme_pubr()
The layperson plot above shows the mean risk aversion score by country and gender. From this graph, it is easy to observe that females tend to take more risks than males because of their lower risk aversion score and that within each gender, participants from the USA tended to avoid risky decision making, although only by a small margin. Using a layperson graph for this data is useful because the means can be quickly interpreted and helps to draw overarching conlusions about risk aversion that the multiple regression did not provide.
While it is impossible to fully remove bias from any interpretation of data, we are cognizant of our ethnic, cultural, socio-economic, and gender backgrounds and have done our best on not letting our biases have a strong influence over us. For our interpretation of the data, we made sure to let the regression analysis speak for itself and did not try to add any explanations or assumptions that were not supported by the data.
We have made a concerted effort to include intuitive visualizations that help to communicate our correlation, factor, and regression analyses in a clear and simple manner. The network plots help the reader to understand the strength of the correlations between variables, while the interactive 3-D plot allows the the reader to make sense of the regression analysis.
We looked at the data and recognize that there could be some bias in the data in regards to comparing two very different countries, the United States and New Zealand. Specifically, the data for the variable “wearmask” may be skewed because not wearing a mask may not indicate reckless activity in New Zealand since they were able to effectively lock down and beat the virus early on. Additionally, we were careful not to assume that someone who did not wear a mask or participate in socially distant behaviors was reckless, and instead maybe did not have means to purchase a mask or had to continue to work in hazardess conditions.
Unfortunately, we both have similar backgrounds and we are not the most diverse group. For the purpose of this class, our lack of diversity is okay but if we were publishing our data analysis for public use, we would add people to our group to reflect more diversity of background and opinion. Adding diversity to a group increases the idea streams that are able to be produced and hopefully combat any mal-adaptive interpretations and expand our results.
If we were to publish our data analysis, we would have to contact the researchers who collected the data and see what the process would be to have the proper consent from the people in the study to use their data in a published document.
We have made a concerted effort to not make irresponsible conclusions based off of our data. In addition, our data analysis is not being published publicly so it will not be seen (and potentially used for nefarious purposes) by anyone outside this class. In the unlikely event that someone was harmed from the results of our data, we would see where the issue was and make immediate corrections, as well as a public apology.
We have aimed to be as fair as possible to all of the different user groups that we looked at in our analysis. We have done this by making our hypothesis based off verified gender equality data and by not making conclusions about them that were not backed up by data. In fact, our regression analysis results were not strong and therefore we could not make any conclusions about the groups we looked at. While we did note differences between gender groups across countries, we are aware of the binary structure of the gender data as we interpret those results.
The data that we used was collected by researchers, who have made their data available to the public. The original data itself is public, our analysis and interpretive data will be kept private because we will not be publishing our work publicly. It is highly unlikely that our data will be stolen but if someone were to attempt this they would have to hack into our RStudio Cloud Accounts, which would be very difficult and would likely end in failure.
Ultimately, the regression analyses do not have enough explanatory power for our hypothesis to be proven true. We think that researching this phenomenon further and with more data could yield significant insights on how different countries, and the gender groups within those countries, will react to future medical crises based off data from the COVID-19 pandemic. Going forward, we should make an effort to find additional data that has more than one variable to capture Risk Aversion; this will make our analysis more accurate because the multiple variables give the Risk Aversion the explanatory power of our multi-variable factors. In addition, we should also attempt to find a dataset that has data for more countries; this would hopefully allow us to compare countries that have significantly different gender equality scores but that are more similar in population size. Finally, we should also attempt to find data that has data more than two genders; a significant amount of the world’s population does not identify within the gender binary and their experiences should be recognized.